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Abstract — Cut-set bounds on achievable rates for network 
communication protocols are not in general tight. In this paper 
we introduce a new technique for proving converses for the 
problem of transmission of correlated sources in networks, that 
results in bounds that are tighter than the corresponding cut- 
set bounds. We also define the concept of "uncertainty region" 
which might be of independent interest. We provide a full 
characterization of this region for the case of two correlated 
random variables. The bounding technique works as follows: on 
one hand we show that if the communication problem is solvable, 
the uncertainty of certain random variables in the network with 
respect to imaginary parties that have partial knowledge of the 
sources must satisfy some constraints that depend on the network 
architecture. On the other hand, the same uncertainties have 
to satisfy constraints that only depend on the joint distribution 
of the sources. Matching these two leads to restrictions on the 
statistical joint distribution of the sources in communication 
problems that are solvable over a given network architecture. 



I. Introduction 

Consider a directed network with a source s and two sinks 
t\ and £2(1] Suppose that the source observes i.i.d. copies of 
random variables X, Y jointly distributed according to p(x, y). 
Sink fi is interested in the i.i.d. copies of X, while sink t 2 is 
interested in the i.i.d. copies of Y. We consider the problem 
of reliable transmission to fulfill the demands of both sink 
nodes with probability converging to one as the number of 
i.i.d. observations of X, Y grows without bound. 

The cut-set bound says that if the demands of both sinks can 
be fulfilled, each of the cuts that separate s from t\ must have 
capacity at least H(X), each of the cuts that separate s from 
t% must have capacity at least H(Y) and each of the cuts that 
separate s from (ti, ta) must have capacity at least H(X, Y). 
The cut-set bound is known to be tight when X — (Mo, Mi) 
and Y — (Mq, M%) for some mutually independent random 
variables Mq, Mi, Mi |fl~), CI- Another case is when X and 
Y are "linearly correlated" in the sense that one can express X 
and Y as X = AU m and Y = BU m for some random vector 
U m , and matrices A and B all taking values in a given field. 
Without loss of generality one can assume that the rows of A 
and B are linearly independent. By applying suitably chosen 

'To convey the basic ideas in the simplest way, throughout this paper we 
assume that there are two sources. Generalization to more than two sources 
(sinks) is also possible. 



invertible linear transformations T\ and T 2 , we can write 



T X X = 
T 2 Y = 



A 
Si 



U'" 
U" 



where the rows of Aq, A\ and B\ are linearly independent. 
Because the linear transformations T\ and T2 are invertible, 
the communication task is to transmit the common message 
AolI m to both the sinks, and the private messages A\U m and 
Bill" 1 to the two sinks. Clearly this problem reduces to the 
one mentioned above if AqU" 1 , A\\J m , B\V m are mutually 
independent. Therefore the cut-set bound is also tight in such 
cases. 

However, in general when the joint distribution of X and 
Y is arbitrary the cut-set bound is not always tight. To 
go beyond the cut-set bound, we devise a new technique 
for proving converses for the problem of transmission of 
correlated sources over networks. We provide an example for 
which the cut-set bound is not tight, but the new converse 
is tight. Nonetheless the problem of finding joint distribution 
of the sources in communication problems that are solvable 
over a given network remains an open problem. One can refer 
to the several papers written on this topic for treatments of 
special cases of this problem (see for instance lHH- llTZl '). Some 
of these works discuss different settings in which separated 
source coding and network coding becomes either optimal or 
suboptimal. 

At the heart of our technique lies the concept of "uncer- 
tainty region" and how we relate it to networks. We define 
the uncertainty region as the set of all possible uncertainty 
vectors where each of these vectors are trying to capture the 
uncertainty of a given random variable from the perspective of 
different observers who have access to distinct but dependent 
sources. More precisely, given an arbitrary random variable 
K, a vector formed by listing the uncertainty left in K when 
conditioned on different subsets of i.i.d. copies {X n ,Y n }, 
i.e. [^H(K), lH(K\X«), ±H(K\Y n ), ^H(K\X n , Y n )], 
is called an uncertainty vector. Since the statistical dependence 
between the sources affects the uncertainty region in a crucial 
way, our discussion of correlated sources here is not an 
straightforward extension of the case of independent sources. 
Our technique also differs from those developed by Kramer et 




al. US), Harvey, et al. O and Thakor et al. 02), all of which 
concern transmission of independent sources over networks. 

The rest of the paper is organized as follows. In Section [EI] 
we motivates our new technique. Section [III] contains one of 
the main results of this article, a complete characterization of 
the uncertainty region. Section IVl includes the proofs. 

II. Motivation 

This section motivates our technique which is based on 
uncertainty computations. For the ease of exposition and to 
convey the main ideas, discussions in this section will be 
quite intuitive and not rigorous. A precise discussion will be 
provided later. 

Let us begin with the well-known butterfly network shown 
in Figure Q] Assume that the source is observing n i.i.d. 
repetitions of the correlated binary sources (X, Y), Thus the 
source has a length-n vector X n and the length-n vector Y n . 
The first sink is interested in recovering the n i.i.d. repetitions 
of X whereas the second sink is interested in recovering 
the n i.i.d. repetitions of Y. Probabilities of error at both 
sinks are required to converge to zero as the number of i.i.d. 
observations of X, Y grow without bound. For the sake of 
simplicity we restrict ourselves to networks such that the cut 
towards the first receiver across edges 4 and 6, and the cut 
towards the second receiver across edges 5 and 6, are tight; 
that is C* 4 + C 6 = H{X) and C 5 + C* 6 = H(Y). Let K 
denote the random variable that is put on edge 6 as shown 
in Figure Q] Using the source coding theorem and the fact 
that C 4 + C 6 = H{X), one can conclude that H(K\X n ) 
ought to be negligible if the demand of the first sink is to be 
fulfilled. Similarly H(K\Y n ) ought to be negligible. Therefore 
K corresponds to common randomness between X n and Y n 
in the sense of Gacs-Korner J3j. This common information 
is equal to max H(T) where T is both a function of X and 
Y . For binary sources this common information is non-zero 
if and only if X = Y ox X = 1 — Y . Thus in the general 
case, the Gacs-Korner common information for binary random 



variables is zero, implying that ^H(K) should be almost 
zero. This effectively implies that we are not using edge 6 
in communication at all. But the cuts at the two sinks were 
tight, implying that C 4 < H(X) and C 5 < H(Y). There is not 
enough rate to communicate X n and Y n through these links. 
This implies that the required communication demands cannot 
be simultaneously satisfied. Note that because even a small 
perturbation in the joint distribution can destroy the Gacs- 
Korner common information between two random variables, a 
given network that supports transmission of certain correlated 
sources, may not support transmission of correlated sources in 
its immediate vicinity, a discontinuity type phenomenon. 

Our second example is again based on the butterfly network 
of Figure [2] with a passive eavesdropper on one of the nodes 
as shown in the figure. The eavesdropper can observe random 
variable K but cannot tamper with any of the messages. The 
goal of the code is to keep the eavesdropper almost ignorant of 
the message of the first sink. That is, we would like to restrict 
our attention to those codes in which K is almost independent 
of X n . Further, assume that the cut at the second sink is tight, 
i.e., C5 + Ce — H(Y). We claim that one must then have 
C 4 > H(X), C 6 < H(Y\X), C 5 > I(X;Y). Otherwise, the 
sources are not transmittable. 

To see this, take a code of length n. Let L and R respectively 
denote the messages that are put on the edges with capacities 

(a) 

C 4 and C s . We have nC 4 > H{L) > I(L;X n \K) = 

0) 

I(LK;X n ) = H{X n ) = nH(X). Approximation (a) is a 
consequence of the fact that K is almost independent of X n , 
and (b) follows from the fact that X n should (with high prob- 
ability) be recoverable from L and K. Therefore C 4 > H(X). 
Since C5 + C§ = H(Y), that is the cut at the second sink is 
tight, both K and R must essentially be functions of Y n . Thus 
we have H(K) = I(K; Y n \X n ) < H{Y n \X n ) = nH{Y\X). 
Thus if C e > H(Y\X), the inequality H(K) < nH{Y\X) 
implies that the edge with capacity Ce is not fully used. But 



since C5 + C§ = H(Y) and Y n is recoverable (with high 
probability) from R and K, one must fully exploit the edge 
with capacity Cq. This is a contradiction. 

These two examples can be recast in the same 
language if one considers the "uncertainty" vector 
[±H(K), ±H{K\X% ±H(K\Y«), ±H(K\X n ,Y n )], 
i.e. the vector formed by listing the uncertainty left in K 
conditioning on different subsets of {X n ,Y n }. In the first 
example, each of X n and Y n is almost sufficient to determine 
K. Thus, the last three coordinates of the uncertainty vector 
are almost zero. Thus, the Gacs-Korner common information 
can be reinterpreted as providing an upper bound for the 
first coordinate of the uncertainty vector when all the other 
coordinates are zero. In the second example, the secrecy 
constraint of K being almost independent of X n imposes 
the constraint that the first and the second coordinate of the 
uncertainty vector are equal. The fact that K is a function of 
Y n implies that the third and the fourth coordinate are almost 
zero. Thus the uncertainty vector is of the form [a, a, 0, 0]. 
The constraint Cq < H(Y\X) can be interpreted as saying 
that the maximum value of a such that the uncertainty vector 
[a, a, 0, 0] is plausible, is a = H(Y\X). 

III. The Uncertainty Region 

The above section motivates the definition of the uncertainty 
region. In this section we formally define this region and 
then provide a complete characterization of it. In the next 
section we discuss the use of the uncertainty region in proving 
converses. 

Given joint distribution p(x, y) on discrete random variables 
X and Y, let us define a four-dimensional region uncertainty 
region, U(p), as the closure of the set of non-negative 4-tuples 

(1*1, u 2 , u 3 , U4) such that for some n and p(k\x n ,y n ) we have 



ui = -H(K), 
n 

u 3 = -H(K\Y J 
n 



u 2 = -H(K\X n ) 
n 

u 4 = —H(K\X n , Y n ) 
n 



Intuitively speaking, the coordinates of this vector are the 
uncertainties of K when i.i.d. copies of a subset of variables 
X and Y are available. We are interested in the set of 
all plausible uncertainty vectors. Note that we define the 
uncertainty region in terms of p(x, y) alone, irrespective of 
the network architecture. 

We now fully characterize the uncertainty region. The proof 
is provided in Q. 

Theorem 1: The region U(p) is equal to the convex enve- 
lope of the union of the following four sets of points. The first 
set is the union over all c > and p(e\x,y) of non-negative 
4-tuples (ui, u 2l 1*3, U4) where 

ui = c + I(E;X,Y), 
u 2 = c + I{E;Y\X), 
u 3 = c + I(E;X\Y), 
u 4 = c. 



The second set of points is the union over all c > of 4-tuples 

(iti, U2,U3,Ui) where 

til =c + H(Y\X), 
u 2 = c + H(Y\X), 
u 3 = c, 
U4 = c. 

The third set of points is the union over all c > of 4-tuples 

(iti,M 2 ,'"3,"4) where 

Ul = c + H(X\Y), 
u 2 = c, 

u 3 = c + H{X\Y), 
U4 = c. 

The fourth set of points is the union over all c > 0, 
< / < max(H(X\Y),H(Y\X)) of non-negative 4-tuples 
(u±, «2, u 3i U4) where 

ui = c + /, 

u 2 = c + min{f,H(Y\X)), 
u 3 = c + min{f,H(X\Y)), 

«4 = C. 

Remark 1: One can use the strengthened Caratheodory the- 
orem of Fenchel lTT7l to prove a cardinality bound of | X\ |!V|+2 
on the auxiliary random variable E in the first set of points. 

Although the above theorem characterizes the region, the 
following outer bound is useful in some instances. The extreme 
points of this outer bound belong to the first set of points of 
the above theorem. 

Theorem 2: The uncertainty region is a subset of the union 
over all c,g,h > and p(e\x,y) of 4-tuples (1*1,1*2,^3,^4) 
where 

ui =c + I{E;XY) 
u 2 = c + I(E-Y\X)+g 
u 3 = c + I(E:X\Y) + h 
U4 — c. 

IV. Writing Converses Using the Uncertainty 
Region 

Take an arbitrary directed network Af with a source s and 
two sinks t\ and t 2 . Suppose that the source observes i.i.d. 
copies of X, Y jointly distributed according to p(x,y). Sink 
t\ is interested in the i.i.d. copies of X, while sink t 2 is 
interested in the i.i.d. copies of Y . The capacity of an edge 
e is denoted by C e . An (n, e) code for this network consists 
of a set of encoding functions at the intermediate nodes such 
that X n and Y n can be recovered at the first and second sinks 
respectively with probabilities of error less than or equal to e, 
and furthermore the number of bits passed on a given edge e 
is at most n(C e + e). 

In order to write a converse for Af we take the edges one 
by one and write a converse for that particular edge. At the 
end we intersect all such converses. 



Take an (n, e) code. Take a particular edge e and 
let K denote the random variable that is put on the 
edge e. The idea is to find as many constraints as 
possible on the uncertainty vector associated to K, i.e. 
[^H(K), ±H(K\X n ), ±H(K\Y n ), ±H(K\X n , Y n )]. Let us 
denote the first coordinate ~H(K) by d e , defined as the 
entropy rate of the random variable on edge e. This d e is 
required to satisfy < d e < C e + e. Every cut that has the 
edge e and separates the source from the first sink imposes a 
constraint on ^H(K\X n ) as follows. 

Lemma 1: Take an arbitrary cut (containing e) from the 
source to the first sink, and let Cut x denote the sum of the 
capacities of the edges on this cut. Then —H(K\X n ) must 
satisfy the following inequality: 

^H(K\X n ) < Cut x -C e +d e - H(X) + k(e) 

for some function k(e) that converges to zero as e converges 
to zero. 

Proof: Let Q denote the collection of random variables 
passing over the edges of the cut (except e). As shown in 
0, -^H(Q) < Cut x — C e + me, where m is the number 
of edges in the graph. Since (Q, K) is the collection of the 
random variables passing the edges of the cut, X n should be 
recoverable from (Q, K) with probability of error less than or 
equal to e. Thus, by Fano's inequality —H(X n \Q, K) < fci(e) 
for some function fci (e) that converges to zero as e converges 
to zero. We have 

-H(K\X n ) < -H(Q,K\X n ) = -H{Q,K,X n ) - -H{X n ) 
n n n n 

< -H(Q) + —H(K) + -H(X n \Q, K) - H{X) 
n n n 

< Cut x - C e + me + d e - H(X) + k x {e). 

We get the desired result by setting k(e) = ki(e) + me. ■ 
Other restrictions on -^H(K\X n ) may come from secrecy 
constraints. For instance if K is observed by an eavesdropper 
and there is an equivocation rate constraint on how much the 
eavesdropper can learn about X n , say -I(K; X n ) < R, we 
can conclude that ±H(K\X n ) > ^H(K) - R = d e -R. 

One can use similar ideas to impose constraints on 
±H(K\Y n ). 

If there is no secrecy constraint, without loss of generality 
we assume that K is a function of (X n ,Y n ) as randomized 
coding would only reduce the throughput. Thus the last co- 
ordinate -^H(K\X n , Y n ) will be zero. The following lemma 
(whose proof is similar to that of Lemma Q] and hence is 
omitted) is also useful. 

Lemma 2: Take an arbitrary cut containing e from the 
source to the first sink, and let Cut x ^ y denote the sum of 
the capacities of the edges on this cut. Then ^H(K\X n ,Y n ) 
must satisfy the following inequality: 

l-H(K\X n , Y n ) < Cut x . y -Ce + de- H{X, Y) + k(e) 

for some function k(e) that converges to zero as e converges 
to zero. 



Thus for every (n, e) code we write all such constraints on 
the coordinates of 

-H(K), -H(K\X n ), —H(K\Y n ), -H(K\X n , Y n ) . 
n n n n 

Lastly we look at these constraints over a sequence of codes 
(rii,ei) where e.- L — > as i — > oo. As an example, consider 
a problem with no secrecy constraints. Let M incut x be the 
smallest cut that has the edge e and separates the source from 
the first sink. Mincuty and Mincut x y are defined similarly. 
For the code (n^, e^) we have 

— H(Ki) =d ei: 

m 

—H(Ki\X ni ) <Mincut e x - C e 

Hi 

+ d ei - H{X) + k{e t ), 
—H(Ki\Y ni ) <Mincutt - C e 

+ d ei -H{Y)+k(e i ), 
— H{K % \X n \Y n >) = <Mincut e x „ - C e 

+ de,-H{X,Y) + k{ei). 

There is a convergent subsequence d e i converging to some 
d* e < Ce- Therefore the region U(p) contains a point 

[ux, U2, 1*3, U4] such that 

mi = d* e , 

u 2 < Mincut e x -C e +d* e - H(X), 
u 3 < Mincutl -C e +d* e - H(Y), 
u 4 = < Mincut e x y - C e +d* - H(X, Y). 

From Theorem 2 we know that there exist c,g,h > and 
p(e\x, y) such that 

Ul = c + I(E; X, Y), u 2 = c + I(E; Y\X) + g, 

u 3 = c + I(E;X\Y) + h, u 4 = c. 

Thus, there exists a p(e\x,y) such that 

d* e = I(E;X,Y)<C e (1) 
Mincut e x -C e + d*- H(X) > I(E; Y\X) (2) 
Mincut e y -C e + d* e - H{Y) > I{E; X\Y). (3) 

And furthermore < Mincut x y -C e + dl- H(X, Y). These 
inequalities together form a converse for the edge e. We can 
repeat this process for all the edges and take intersection over 
all such converses. 

A. Comparison with the cut-set bound 

Let us compare the above converse with the one given by 
the cut-set bound. Take some edge e. The constraints 

d* e =I(E;X,Y) < Ce, 
Mincut e x -C e + d* e - H(X) > I(E; Y\X), 
Mincut e y -C e +d* e - H(Y) > I(E; X\Y), 
Mincutl y -C e + d* e - H{X, Y) > 



X n Y' 




Fig. 3. This network is the Gray-Wyner system when C3 = C4 = C5. 

imply that Mincut%- H{X) > 0, Mincut e y - H{Y) > and 
Mincut e x y — H(X, Y) > 0. Since edge e was arbitrary, one 
can see that this converse is no worse than the cut-set bound. 
Let us consider the network given in figure [3] Assume that 
C 3 = C4 = C5. This network is known as the Gray-Wyner 
system 0. Let us write the converse for the edge number 3. 
The converse says that there exists a p(e\x,y) such that 

d* 3 = I(E;X,Y) <C 3 , 
Mincutl ~C 3 +d* 3 - H(X) > I(E; Y\X), 
Mincutl -C 3 + d* 3 - H(Y) > I(E;X\Y), 
Mincut 3 x y -C 3 + d* 3 - H(X, Y) > 0. 

Note that Mincutl = C 4 + C\ = C 3 + Ci, M incut y = 
C 5 + C 2 =C 3 + C 2 and Mincut? x>y = C X +C 2 + C 3 . Thus 

d* 3 = I(E;X,Y)<C 3 , 
C 3 +C x -C 3 +d* 3 - H(X) > I{E- Y\X), 
C 3 +C 2 -C 3 + d* 3 - H(Y) > I(E;X\Y), 
d + C 2 + C 3 - C 3 + d* 3 - H{X,Y) > 0. 

After simplification and substituting the value of d 3 = 
I(E;X,Y) from the first equation into the other equations 
we get that 

C 3 >I(E;X,Y), 

Ci > I(E; Y\X) - I(E; X, Y) + H(X) = H(X\E), 
C 2 > I(E; X\Y) - I(E; X, Y) + H(Y) = H(Y\E), 
Ci+C 2 > H(X, Y) - I(E; X, Y) = H(X, Y\E). 

The last equation is redundant. Therefore we get 

C 3 > I{E-X,Y),C X > H{X\E),C 2 > H(Y\E) 

for some p(e\x,y). But this is exactly the solution to the 
Gray-Wyner system [5|. Therefore the new converse is tight. 
On the other hand the cut-set bound is not tight for this 
network. Let us consider the minimum of C 3 such that 
C\ + C 2 + C 3 = H(X, Y) over the actual region and the 
cut-set bound. It is known that in the Gray-Wyner system this 
minimum is equal to the Wyner's common information J6). 
However, in the cut-set bound this minimum is I(X; Y) which 
can be strictly less than the Wyner's common information. 




Fig. 4. An explicit example for a multi-source problem that shows the benefit 
of using edge-cuts. We write the edge-cut for edge 6. 

Therefore the new converse represents a strict improvement 
over the cut-set bound. 

B. Using "Edge-Cuts" to write better converses 

The new converse as expressed above is not also tight in 
general. In the above discussion we observed that every cut 
that has the edge e and separates the source from the first sink 
imposes a constraint on (K\X n ). However it turns out that 
one can use the technique to write strictly better converses by 
looking at what might be termed "edge-cuts" (certain cuts in 
certain subgraphs of the original graph) if there are multiple 
source nodes in the network. Our concept of edge-cuts should 
not be confused with that of lfl5l . 

In order to construct an explicit example for multi-source 
problems that shows the benefit of using edge-cuts, we con- 
sider a directed network with two sources si and s 2 and 
two sinks t\ and t 2 of Figure |4] under the assumption that 
Cq = C7 = C&. 

Suppose that the source si observes i.i.d. copies of the 
random variable X, and source s 2 observes i.i.d. copies of 
the random variable Y. As before, random variables X and 
Y are jointly distributed according to p(x,y), and sink t\ is 
interested in the i.i.d. copies of X while sink t 2 is interested 
in the i.i.d. copies of Y . We consider the problem of reliable 
transmission to fulfill the demands of both sink nodes, with 
probability of decoding error converging to zero as the number 
of i.i.d. observations of X, Y grows without bound. 

1) edge-cuts: Take an arbitrary edge e in a directed graph 
from a vertex v\ to a vertex v 2 . Consider the subgraph formed 
by including all the directed paths from the two sources to v 2 . 
We can think of v 2 as an imaginary sink in this subgraph. Let 
K denote the random variable carried on the v\ — v 2 edge. We 
can consider three types of cuts between the two sources and 
the imaginary sink in this subgraph: 1. cuts that that separate 
the first source from node v 2 but do not separate the second 
source from node v 2 , 2: cuts that separate the second source 
from v 2 but do not separate the first source from node v 2 , and 
3. cuts that separate both sources from node v 2 . Let Cut XtVtV2 
denote the sum-capacity of an arbitrary cut that separates both 




imaginary sink 



Fig. 5. The subgraph formed by including all the directed paths from the 
two sources to the end point of edge 6, i.e. the node V2 . We can think of 
V2 as an imaginary sink in this subgraph. Edge-cuts are the cuts between the 
two sources and the imaginary sink in this subgraph. 

sources from node v 2 in the subgraph. We have 

Cut XiyiV2 > -I(K;X n ,Y n ) 
n 

Let Cut x ^ V2 denote the sum-capacity of an arbitrary cut that 
separates the first source from node v 2 in the subgraph. We 
have 

Cut x , V2 > -I(K;X n \Y n ) 

Similarly, let Cut V:V2 denote the sum-capacity of an arbitrary 
cut that separates the second source from node v 2 in the 
subgraph. We have 

Cut y . V2 > ^I{K;Y n \X n ) 

These inequalities have consequences for the uncertainty vec- 
tor [Iff (A'), ±H{K\X n ), ±H(K\Y n ), ~H(K\X n , Y n )]. 

Consider the edge 6 in Figure |4] The resulting subgraph 
formed by including all the directed paths from the two 
sources to the end point of this edge is shown in Fig- 
ure |5] Let Kq denote the random variable carried on this 
edge. Observe that edge 2 is a cut that separates the first 
source only from the imaginary sink. Therefore we can write 
±I(K 6 ; X n \Y n ) < C 2 . Since H(K 6 \X n ,Y n ) = 0, we 
conclude that -H(Ke\Y n ) < C 2 . It is not possible to get 
this constraint on the uncertainty of Kq given Y n by looking 
at the cuts between the sources and the sinks in the original 
graph. To see this note that if we use equations (fl]|3]i for all the 
cuts that have the edge 6 we get the following set of equations: 

d 6 = I(E 6 ;XY) <C 6 

C 4 + C 6 -C 6 + d 6 - H(X) > I(E 6 ;Y\X) 

because {4, 7} is a cut between s\, s 2 

and t\ in the original graph 
C 5 + C 6 -C 6 + d 6 - H(Y) > I(E 6 ;X\Y) 

because {5,8} is a cut between si, s 2 

and t 2 in the original graph 



for some p(ee\x,y). Here we used the fact that the capacities 
of edges 6, 7 and 8 are all the same, hence we can assume 
that they are all carrying the same message. Therefore we can 
compute the uncertainty of the message on edge 6 by looking 
at cuts that include edge 7 or 8. 

The next step is to incorporate the inequality 
-^H(Kq\Y 71 ) < C 2 with the above set of inequalities. 
Remember that C 5 + C 6 - C 6 + d 6 - H(Y) in the third 
inequality above is an upper bound on ^H{K§\Y n ). This 
comes from Lemma Q] The term I(Eq; X\Y) is a lower 
bound on -H(Kq\Y u ). This comes from Theorem [2] Now, 
using the inequality ^H(Kq\Y u ) < C 2 we can conclude that 
min (C 2l C5 + Cg — Cg + d 6 — H(Y)^j is an upper bound on 
^H{K(,\Y n ). Thus, we can write 

d 6 =I(E 6 ;XY) < C 6 

C 4 + C e -C 6 + d 6 - H{X) > I{E 6 ; Y\X) 

because {4, 7} is a cut between si, s 2 

and t\ in the original graph 
min (C 2) C 5 +C 6 -C 6 + d 6 ~ H(Y)) > I(E 6 ;X\Y) 

because {5,8} is a cut between s%, s 2 

and t 2 in the original graph 

for some p(ee\x, y). This set of equations can be simplified in 
the following form 

C 6 >I(E 6 ;XY) (4) 
d > H(X\E 6 ) (5) 
C 5 > H(Y\E 6 ) (6) 
C 2 >I(E 6 ;X\Y) (7) 

for some p(ee\x, y). 

2) Comparison of two converses: We now compare the 
converse given by equations ()12l [30b with the converse given 
by equations (@J7). The latter converse is derived in the 
appendix by looking at all cuts between the sources and the 
sinks (no edge-cuts here). 

We claim that the minimum possible value of Ce in this 
converse is less than or equal to I(X;Y) if we restrict our- 
selves to networks where C 2 + C4 = H(X\Y). This is shown 
at the end of the appendix. Next consider the converse written 
using edge-cuts and given by equations (@]|7). We show that 
the minimum in the other converse is xmx\x^E^Y I(E; XY), 
i.e. Wyner's common information. From equations [5] and Q 
we have C 2 + C 4 > H(X\E 6 ) + I(E 6 ;X\Y) = H(X\E 6 ) + 
H(X\Y) - H(X\E 6 ,Y) = H(X\Y) + I(X;Y\E 6 ). If we 
restrict ourselves to networks where C 2 + C4 = H(X\Y), 
it must be the case that random variables X — > Eq — >• Y 
form a Markov chain. Therefore the minimum of Cq is 
minx^Ee^Y I{Eq; X, Y) which is equal to Wyner's common 
information. 

Noting that Wyner's common information is in general 
larger than I(X\ Y), we conclude that the later converse is 
strictly better than the former converse. 



V. Proofs 

Proof of Theorem Q} : 
Achiev ability: We begin by showing that each of the four 
set of points is a subset of U(p). This would complete the 
proof noting that U(p) is a convex set in K 4 as it implies that 
the convex envelope of the union of the four sets of points is 
also a subset of U(p). The details of U(p) being a convex set 
are given in Q. Note that if we can prove the inclusion for 
c = in each case, we will have it for all c > since we can 
always add noise to K that is independent of all previously 
defined random variables. Let us begin with the first set of 
points. Take some arbitrary p(e\x, y). We would like to find a 
sequence of p(k n , x n , y n ) such that 

lim -H(K n ) = I{E; X, Y) 

n— >oo ft 

lim -H{K n \X n ) = I(E;Y\X) 

n—toQ ft 

lim -H{K n \Y n ) = I(E;X\Y) 

n—¥oo ft 

lim -H(K n \X n ,Y n ) = 



We use part 1 of Theorem 5 of 10] which says that one can 
find a sequence of p(k n ,x n ,y n ) such that 

lim -I(X n ;Y n \K n ) = I(X;Y\E) 

lim -H(K n \X n ) = I(E;Y\X) 

lim -H(K n \Y n ) = I(E;X\Y) 

n— J-OO Jl 

lim -H{K n \X n ,Y n ) = 

n— >oo fl 



The difference between these set of equations and the ones we 
would like to have is the first one. However these four set of 
equations are indeed equivalent. Note that 

H{K n ) =H{K n \X n ) + H(K n \Y n ) 

- H{K n \X n , Y n ) + I(X n ; Y n ) - I(X n ; Y n \K n ). 

Thus, 

lim -H{K n )= lim -H{K n \X n ) + lim -H(K n \Y n ) 

n^oo ft n— >oo fl n^-oc fl 

~ lim -H{K n \X n ,Y n ) + I(X;Y) 

n— »oo TL 

- lim -I(X n -Y n \K n ) 

= I(E;Y\X)+I(E;X\Y) 
+ I(X-Y) - I(X-Y\E) 
= I(E;X,Y). 

We now prove that the second and the third sets of points 
is in U(p). Slepian-Wolf tell us that for any e one can find N 
such that for any n > N there are functions M xn : X n t-» [1 : 



2 n(H(X\Y)+e)} and M ^ . yn ^ . 2 n{H(Y\X)+e)} such ^ 

X n can be recovered from (M xn (X n ),Y n ), and Y n can be 
recovered from (M yn (Y n ), X n ) with probability 1 — e. One 
can prove thajl 

-I(M xn (X n );Y n ) <n(e), (8) 
n 

-I(M yn (Y n );X n )<r 2 (e), (9) 
n 

-H(M xn (X n )) > H{X\Y) ~ r 3 (c), (10) 
n 

-H{M yn (Y n )) > H(Y\X) - r 4 (e). (11) 
n 

for some functions r.i such that rj(e) converges to zero as 
e converges to zero. Setting K n = M yn (Y n ) would give 
us the second set of points as e — s> and n —> oo. 
To see this note that linin^oo —H(K n ) — H{Y\X) be- 
cause of equation ( fTTT > and the fact that M yn is taking 
value in [1 : 2"' H ( y l x ) +e )]. Furthermore one can show 
that lirrin-Kx, ±H(K n \X n ) = H{Y\X) using equation ©. 
Similarly setting K n = M xn (X n ) asymptotically gives us the 
third set of points. 

We now prove that the fourth set of points is in U(p). In 
order to define K n appropriately to get this set of points we are 
going to use random variables M yn and M xn defined above. 
For every n G N, we can find some e„ such that equations |8|[TT1 
hold, and that e„ converges to zero as n converges to infinity. 
Next, take some arbitrary < / < m&x(H(X\Y), H(Y\X)). 
We would like to find a sequence of p(k n ,x n ,y n ) such that 

lim -H{K n ) = f 

n—toc fl 

lim -H{K n \X n ) =min(/,tf(Y|X)) 

n— >oo fl 

lim -H{K n \Y n ) = mm(f,H(X\Y)) 

n— >oo fi 

lim -H(K n \X n ,Y n ) = 0. 

n— >oo fl 



Let us define the functions M xn e [1 : 2"W x l Y ) +e ™)] 
and M yn G [1 : 2"W Y l x )+ e "'] as above. We can think of 
M xn (X n ) and M yn (Y n ) as two random binary sequences of 
length [n(H(X\Y)+e n )\ and [n(H(Y \X)+e n )\ respectively. 
Let us use the notation My^(Y n ) to denote the set of i th to 
j th bits of M yn (Y n ). We use a similar notation for M xn (X n ). 

Without loss of generality let us assume that H(X\Y) > 
H(Y\X). Consider the following two cases: 

Case 1. f < H{Y\X): 

In this case, we let K n be equal to the bitwise XOR 
of the first |n/J bits of M xn (X n ) and M yn (Y n ), i.e. the 

2 For instance the first equation holds because —I(M X n(X n );Y n ) = 
±(H(M xn (X n ))+H(Y")-H(M xn (X"),Y n )) n = ±(H(M X „(X")) + 
H(Y n ) - H(X n ,Y n ) + H(X n \M xn (X n ),Y n )) < H(X\Y) + e + 
H(Y) - H(X, Y) + h(e) +e\X\\y\ by the Fano inequality and the fact that 
M xn is a function of X". The third equation holds because it is possible to 
reconstruct (X" , Y n ) from M xn (X n ) and Y n with high probability. 



bitwise XOR of M^ nfi {X n ) and My^ nfi (Y n ). Clearly 
±H(K n \X n , Y n ) = 0. We would like to show that 



lim -H(K n ) = f, 

n—t-oc n 

lim -H{K n \X n ) = f, 

n— hoc fi 

lim -H(K n \Y n ) = f. 

n— foo Ji 



It suffices to prove the last two inequalities since 
H{K n \X n ) < H(K n ) < log \JCn\ < nf. We prove the second 
one, the proof for the third is similar. Note that H(K n \X n ) = 
H(K n \X n ,M^ nfi (X n )) = H{M^ nn {Y n )\X n ). Equa- 
tion [9] implies that 

-I(M^i(Y n );X n )<r 2 (e n ). 

Thus, 

lim -H(K n \X n )= lim -H (Af*WJ (y n )). 



will be infinity and is achieved at the point [c, c, c, c] when 
c — > 00. If Ai + A2 + A3 + A4 < 0, we can write the maximum 
of Ai«i + A 2 w 2 + A3U3 + A4U4 over U(p) as 

lim sup - (\iI(K; X n Y n ) + X 2 I{K; Y n \X n )+ 



X 3 I(K; X n \Y n ) + (Ai + A 2 + A 3 + X 4 )H(K\X n , Y n ) 



The last term (Ai + A 2 + A 3 + X±)H(K\X n , Y n ) is less than 
or equal to zero. Given any (K, X n , Y n ), we can always use 
part 1 of Theorem 5 of |@] as in the achievability to find 
(K',X nm ,Y nm ) for some m such that K' is a function of 
(X nm , Y nrn ) and sum of the first three terms is asymptotically 
unchanged. K' being a function of (x nm 1 Y nm ) implies that 
(Ai + A 2 + A 3 + Xi)H{K'\X nm ,Y nm ) is zero. To sum 
up, without loss of generality we can consider only random 
variables K that are deterministic functions of (X n ,Y n ), and 
furthermore we only need to compute the following expression 
over such random variables 

lim sup - [XiI(K; X n Y n ) + X 2 I(K; Y n \X n ) 



Clearly lim r , 



< 



If 



limn^oo ±H(My£ nfi (Y n )) < f then additionally 
considering the [nf\ + 1 to [^^(^1^) + ne n \ bits of 
M yn can at most increase the asymptotic entropy rate by 
H[Y\X) — / bits. On the other hand equation [TTI implies that 
limn-voo ±H(Myn(Y n )) = H(Y\X). This is a contradiction 
because using the fact that the joint entropy is less than or 
equal to the individual entropies one can write 

lim -H(M yn (Y n ))< lim -H(M^ nfl (Y n )) 

+ lim I#( A 4^J+ 1: L'^( y l*>+-"J(F")) 

n—>oc ft 

<f + n-f = n. 

Case 2. H{Y\X) < f < H{X\Y): In this case, let K n 
be equal to the bitwise XOR of M^ nH{Y{X)i (X n ) and 

M l: L n ff( y|X)j (yn)) together w . th M lnH(Y\X)l + l:lnf i{xny 

In this case, one needs to show that 
lim -H(K n ) = /, 

n— too ft 

lim -H(K n \X n ) = nH(Y\X), 

n— >QO fi 

lim -H(K n \Y n ) = f. 

n— >oo fi 



As in case 1, the third equation implies the first. The proof 
for the last two limits is similar to the one discussed above in 
case 1. 

Converse: Since U(p) is convex, to show that the region 
U(p) is equal to the convex envelope of the given set of points, 
it suffices to show that for any real Ai, A4, the maximum 
of A1M1 + A2U2 + A3M3 + A4M4 over U (p) is achieved at one of 
the given points. We show this by a case by case analysis. First 
assume that Ai + A 2 + A3 + A4 > 0. In this case maximum 



X 3 I(K;X n \Y n ) 



We now continue by a case by case analysis: 

• Ai > 0, A 2 > 0, A 3 > 0: Note that if we replace K with 
(K, X n , Y n ) the expression will not decrease. Since K is 
a function of (X n ,Y n ), we conclude that K = X n Y n 
is the optimal choice in this instance. In this case the 
maximum of A1U1 + A 2 u 2 + A3U3 + A4U4 over U(p) will 
be equal to the maximum of the same expression over 
the first set of points with the choice of E = XY. 

• Ai > 0, A 2 < 0, A3 > 0: If Ai + A 2 > 0, the maximum 
of Aiiti + A 2 it 2 + A3U3 + A4U4 over U(p) will be equal 
to the maximum of the same expression over the first set 
of points with the choice of E = XY. To see this write 
X 2 I{K;Y n \X n ) as X 2 I(K ; Y n , X n ) - X 2 I(K; X n ) and 
note that the expression is maximized when K = X n Y n . 
If Ai + A 2 < first note that if we replace K with 
(K, X n ) the expression will not decrease. In this case the 
expression XiI{K, X n - X n Y n ) + X 2 I{K 1 X n ;Y n \X n ) + 
X s I(K,X n ;X n \Y n ) will be equal to Ai#(X n ) + 
X 3 H(X n \Y n ) + (X 1 +X 2 )I(K;Y n \X n ). Since Ai + A 2 < 
0, we have (Ai + X 2 )I(K;Y n \X n ) < 0. Thus the 
maximum of Ai^i + A 2 m 2 + A3U3 + A4U4 over U(p) will 
be less than or equal to X\H{X) + X 3 H(X\Y), which is 
equal to the maximum of the same expression over the 
first set of points with the choice of E = X. 

• Ai > 0, A 2 > 0, A3 < : This case is similar to case 2 
by symmetry. 

• Ai > 0, A 2 < 0, A3 < : Take some arbitrary n 
and K = f(X n ,Y n ). Let the random index J be 
uniformly distributed on {1,2,3, ...,n} and independent 
of (K, X n ,Y n ). Define the auxiliary random variables 

E = {K,X ltJ -- L ,Yi..j-i,J),X = Xj,Y = Yj. Note 



that 

n 

I(K; X n , Y n ) =J2 Xj,^\Xuj i,Y Uj ,) 

n 

= Y,I(K,X 1:j _ 1 ,Y 1 .. j _ 1 ;X j ,Y j ) 

3=1 

= nI(E;X, Y), 

n 

i(k-y"\x") =^7(^ ; y J |x»,y 1:j _ 1 ) 



Lemma [3] 

Ai/(if; X n F") + X 2 I(K; Y n \X n ) 
+X 3 I(K:X n \Y n ) = 
(Ai + A 2 + A 3 )/(if; X"y n ) - A 2 I(if; X") 
-A 3 /(^;y n ) < 
(Ai +A 2 + A 3 )/(if;X"y") 
-X 2 [I(K;X n Y n ) - H(Y n \X n )] + 
-X 3 {I(K;X n Y n ) -H{X n \Y n )]+ = 



3 = 1 



n A 



+ A 2 min( 



7(_ff;Jf™y" 



F(y|X)) + A 3 min( 



I{K\X n Y n ) 



H(X\Y)) 



Y,I{K,Xi: j -i,Y 1:S - 1 ;Y j \X j 

3 = 1 

= nI(E;Y\X) 



Thus, the maximum of the original expression is less than 
or equal to 



and similarly 



max 

0<t<H(X,Y) 



I{K;X n \Y n ) > nI(E;X\Y). 

Since A 2 < 0, A 3 < 0, we have \ 2 ~I(K; Y n \X n ) < 
I(E;Y\X) and X 3 ±I(K; X n \Y n ) < I(E;X\Y). There- 
fore the maximum of Xiui + X 2 u 2 + A 3 u 3 + A4U4 over 
U(p) will be less than or equal to the maximum of the 
same expression over the first set of points. 
Ai < 0, A 2 > 0, A 3 < : If Ai + A 2 > 0, we can write 

XiI(K ; X n Y n ) + X 2 I(K; Y n \X n ) + X 3 I{K; X n \Y n ) < 
XiI(K; X n Y n ) + X 2 I(K; Y n \X n ) = 
XxI{K; X n ) + (Ai + A 2 )/(if; Y n \X n ) < 
(Ai + A 2 )i(if; Y n \X n ) < (Ai + X 2 )H(Y n \X n ) 



Xit + X 2 xmn{t,H{Y\X)) 
X 3 min(t,H{X\Y)) ) = 



max 

0<t<niax(H(X\Y),H(Y\X)) 



A 3 mm(t,H(X\Y)) 



A^ + A 2 mm(t,H(Y\X)) 



Thus the maximum of Aiiti + A 2 w 2 + A 3 u 3 + A4U4 over 
U(p) will be less than or equal to the maximum of the 
same expression over the fourth set of points. 
Lemma 3: Given any three random variables X, Y, K 
where if is a function of (X, Y), we have 



Thus the maximum of A1U1 + A 2 it 2 + X 3 u 3 + A4U4 over 
U(p) will be less than or equal to (Ai + X 2 )H(Y\X), 



I(K;X)>[H(K)-H(Y\X)} + 

I(K;Y) > [H(K) - H(X\Y)} + 

where is when x is negative and x when it is non- 
which is equal to the maximum of the same expression negative. 

over the second set of points. If Ai + A 2 < 0, we can Proo fi We P rove the nrst equation. The proof for the second 
wr jt e one is similar. It suffices to show that I(K; X) > H(K) — 

H(Y\X), which is equivalent with H(Y, X) > H(K, X) and 
X 1 I(K;X n Y n ) + X 2 I(K;Y n \X n ) + X 3 I(K; X n \Y n ) = obviously true. 



(Ai + X 2 )I(K; X n Y n ) - X 2 I(K- X n ) + X 3 I{K; X n \Y n 

< 0. 

Thus the maximum of Aitti + A 2 it 2 + A 3 u 3 + A4U4 over 
U(p) will be zero. 

• Ai < 0, A 2 < 0, A 3 > : This is similar to case 5. 

• Ai < 0, A 2 < 0, A 3 < : This is similar to case 4. 
. Ai < 0, A 2 > 0, A 3 > : If Ai + A 2 + A 3 < 



Proof of Theorem^ Take some n and p(k\x n , y n ) and 
consider the 4-tuples (ui, u 2 ,u 3 , un) 

ui = -H(K) 
n 

u 2 = -H(K\X n ) 

n 

u 3 = —H(K\Y n ) 

n 

u A = —H(K\X n , Y n ) 



XiI(K; X n Y n ) + X 2 I{K- Y n \X n ) + X 3 I{K; X n \Y n ) = 

(Ai + A 2 + A 3 )/(i<r; X n Y n ) - X 2 I(K; X n ) - X 3 I(K; Y n ) » 
< 0. Let c = ±H{K\X n ,Y n ). Let the random index J be 

uniformly distributed on {1, 2, 3, n} and independent of 
Thus K constant works here. If Ai + A 2 + A 3 > using (K,X n ,Y n ). Define the auxiliary random variables E — 



(K,X 1:J - 1 ,Y ls j- 1 ,J),X 
verify that 



Xj,Y 



Yj. One can then 



I(K;X n ,Y n ) = nI(E;X,Y), 
I(K;Y n \X n ) > nI(E;Y\X) 
I(K;X n \Y n ) > nI(E;X\Y). 

Thus, ui = c + I(E; X, Y), u 2 >c + I(E; Y\X) and u 3 > 
c + I(E; X\Y) for some p(e\x, y). ■ 
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Appendix 
A. Using the cuts to write a converse 

In this appendix we use cuts between sources and sinks to 
write a converse for the network of Figure Since there are 
two sources and two sinks in this network there are more types 
of cuts to consider. Every cut divides the nodes of the network 
into two sets A and A c . We use the notation cut( sources in A; 
sources in A c ; sinks in A c ) to denote the edges of such a cut. 
For instance in Figure |4] {4,2} is cut{s\\ s 2 \ ti> t 2 ) meaning 
that edges 4 and 2 are the edges of a cut that has si in A, s 2 
in A c and sinks t\,t% in A c . Suppose we want to write the 
converse for an edge e in cut( sources in A; sources in A c ; 
sinks in A c ). If there is no source in A c , then we can write 
a converse as discussed earlier in equations (JT]|3j. However if 
there is a source in A c , say s 2 , we need to use a modified 
version of Lemma Q] used to bound the entropy of the random 
variable on an edge of the cut conditioned on a source that 
is in A. The inequality of the lemma is weakened by adding 
the joint entropy of all the sources in A c to one side of the 
inequality as shown below. 

Lemma \T\ [revisited]: Take an arbitrary cut containing e 
from the first source to the first sink, and let Cut x denote 
the sum of the capacities of the edges on this cut. Further 
assume that s 2 is in A c . Then -^H(K\X n ) must satisfy the 
following inequalities: 



-H(K\X r 
n 



H(Y) - H(X) + k(e) 



— H(K\X n , Y n ) < Cut x -C e +d e +H(Y)-H(X,Y)+k(e) 
n 

for some functions k(e) that converges to zero as e converges 
to zero. 

Proof: Let Q denote the collection of random variables 
passing over the edges of the cut (except e). Clearly ~H(Q) < 
Cut x + me where m is the number of edges in the 

graph. Since (Q, K) is the collection of the random variables 
passing the edges of the cut, X n should be recoverable from 
(Q,K,Y n ) with probability of error less than or equal to e. 



Thus, by Fano's inequality ±H(X n \Q, K,Y n ) < ki(e) for 
some function fci(e) that converges to zero as e converges to 
zero. We have 

-H(K\X n ) < —H(Q,K, Y n \X n ) 
n n 

= -H(Q, K, Y n , X n ) - -H(X n ) 
n n 

< -H(Q) + -H(K) + H(Y) 



1 



H(X n \Q,K,Y n ) -H{X) 



< Cut x -C e + H(Y) 

+ me + d e - H(X) + fci(e). 

We get the first inequality by setting k(e) = fci(e) + me. For 
the second inequality note that 

-H(K\X n ,Y n ) < -H(Q,K\X n ,Y n ) 
n n 

= -H(Q, K, Y n , X n ) - -H(X n , Y n ) 
n n 

< -H{Q) + -H(K)+H(Y) 
n n 

+ -H(X n \Q, K, Y n ) - H(X, Y) 
n 

< Cut x -C e + H(Y) 

+ me + d e - H(X, Y) + k x (e). 

■ 

We can now write down the converse using the edge-cuts. 
We proceed in a similar fashion that we did in deriving 
equations (Q]|3]l using Lemma Q] (revisited) and Theorem [2] 
Lemma Q] (revisited) gives us upper bounds on the elements 
of the uncertainty vector, whereas Theorem |2] gives us lower 
bounds on these elements. 

Cuts that have edge 2: 

d 2 =I{E 2 ;XY) < C 2 

C 2 + C 4 -C 2 +d 2 + H{Y) - H{X) > I{E 2 ;Y\X) 
C 2 + C±~C 2 +d 2 + H(Y) - H{X, Y)>0 

because {2,4} is cut(si; s 2 ; t\, t 2 ) 
C 2 + C 3 +C 4 + C 5 -C 2 +d 2 ~ H(X, Y)>0 

because {2, 3, 4, 5} is cut(si,s 2 ; 0; ti 7 t 2 ) 

for some p(e 2 \x, y). 
Cuts that have edge 3: 

d 3 =I(E 3] XY)<C 3 

C 3 + C 5 -C 3 + d 3 + H{X) - H(Y) > I{E 2 -X\Y) 
C 3 + C 5 -C 3 + d 3 + H{X) - H(X,Y) > 

because {3,5} is cut(s 2 ; si;ti,t 2 ) 
C 2 +C 3 +C 4 + C 5 -C 3 + d 3 - H{X, Y)>0 

because {2, 3, 4, 5} is cut(si,s 2 ; 0; ti,t 2 ) 



for some p(e 3 \x, y). 
Cuts that have edge 4: 



d 4 = I(E 4 ;XY) < C 4 

C 2 + C 4 -C 4 + d 4 + H{Y) - H{X) > I(E 4 ; Y\X) 
C 2 + C'i - C 4 + d 4 + H{Y) - H(X, Y)>0 

because {2,4} is cut(s\; s 2 ; ti, t 2 ) 
C 2 + C 3 + C 4 + C 5 -C 4 + d 4 - H{X, Y) > 

because {2, 3, 4, 5} is cut(si,s 2 ; 0; t\,t 2 ) 
C 4 + C 5 + C 6 -C 4 + d 4 - H(X, Y)>0 

because {4,5,6} is cut(si, s 2 ; 0; ii, t 2 ) 
C 4 + C 7 -C 4 + d 4 - H{X) > I(E 4 ; Y\X) 

because {4,7} is cut(s\, s 2 ; 0; ti) 



edge 6 by looking at cuts that include edge 7 or 8. 

d 6 = I(E 6 ;XY) <C 6 

C 4 + C 5 +C 6 -C 6 + d 6 - H(X, Y)>0 

because {4,5,6} is cut(s\, s 2 ; 0; ti, t 2 ) 
C 4 + C 6 -C 6 + d 6 + H(Y) - H(X) > I(E 6 ;Y\X) 
C 4 + C 6 -C 6 + d 6 + H(Y) - H(X, Y) > 

because {4,6} is cut(si; s 2 ; t\, t 2 ) 
C 5 + C e -C 6 + d 6 + H(X) - H(Y) > I{E 6 ;X\Y) 
C 5 + C 6 -C 6 + d 6 + H{X) - H(X,Y) > 

because {5,6} is cut(s 2 ; si; ti, t 2 ) 
C 4 + C 6 -C 6 + d 6 - H(X) > I(E e ;Y\X) 

because {4,7} is cut(s\, s 2 ; 0; ii) 
C 5 + C 6 -C 6 + d 6 - H(Y) > I(E 6 ;X\Y) 

because {5,8} is cut(si 7 s 2 ;$;t 2 ) 
C 4 + C 5 +C 7 + C s -C 6 + d 6 - H(X, Y)>0 

because {4, 5, 7, 8} is cut(si,s 2 ; 0; t\, t 2 ) 



for some p(ee\x,y). After simplification and removal of re- 
dundant equations and noting that Ca = C7 = C$, these 
inequalities can be written as follows: 



for some p(e 4 \x, y). 
Cuts that have edge 5: 



d 5 = I(E 5 ;XY) <C 5 

C 3 + C 5 - C 5 + d 5 + H(X) - H(Y) > I(E 5 ;X\Y) 
C 3 + C 5 - C 5 + d 5 + H{X) - H(X, Y)>0 

because {3,5} is cut(s 2 ; S\; t\, t 2 ) 
C 2 +C 3 + C 4 + C 5 - C 5 + d 5 - H(X, Y)>0 

because {2, 3, 4, 5} is cut(si,s 2 ; 0; ti, t 2 ) 
C 4 + C 5 +C 6 - C 5 + d 5 - H(X, Y)>0 

because {4, 5, 6} is cut(si, s 2 ; 0; t\,t 2 ) 
C 5 +C S - C 5 + d 5 - H{Y) > I{E 5 ;X\Y) 

because {5,8} is cut(s\, s 2 ; 0; £2) 



for some p(e^,\x, y). 

Since the capacities of edges 6, 7 and 8 are all the same, 
we can assume that they are all carrying the same message. 
Therefore we can compute the uncertainty of the message on 



I(E 2 ;X,Y) <C 2 
C 4 >H(X,Y\E 2 )-H(Y) 
C 3 + C 4 + C 5 >H(X,Y\E 2 ) 

From equations for edge 2 
I(E 3 ;X,Y)<C 3 
C 5 >H(X,Y\E 3 )-H{X) 
C 2 + C 4 + C 5 >H(X,Y\E 3 ) 

From equations for edge 3 
I(E 4 ;X,Y)<C 4 
C 2 > H(X, Y\E 4 ) - H(Y) 
C 2 + C 3 + C 5 > H(X, Y\E 4 ) 
C 5 + C 6 > H(X, Y\E 4 ) 
C 6 > H{X\E 4 ) 

From equations for edge 4 
I(E 5 ;X,Y) < C 5 
C 3 >H(X,Y\E 5 )-H(X) 
C 2 + C 3 + C 4 >H(X,Y\E 5 ) 
C 4 + C 6 > H(X,Y\E 5 ) 
C 6 > H(X\E 5 ) 

From equations for edge 5 
C 4 + C 5 >H(X,Y\E e ) 
C 4 > H{X\E 6 ) 
C 5 > H(Y\E 6 ) 

From equations for edge 6 



(12) 
(13) 
(14) 

(15) 
(16) 
(17) 

(18) 
(19) 
(20) 
(21) 
(22) 

(23) 
(24) 
(25) 
(26) 
(27) 

I(E 6 ;X,Y) <C 6 

(28) 
(29) 
(30) 



for some p(e 2 , e 3 , e 4 , e 5 , e 6 |x, y). 

We claim that the minimum possible value of Cq in this 
converse is less than or equal to I(X;Y) if we restrict 
ourselves to networks where C 2 + C\ = H(X\Y). This is 
because the choice of C 2 = 0, C* 3 = H(Y), C 4 = H(X\Y), 
C 5 = H(X,Y) and C 6 = I(X;Y) is a valid point in 
this converse region. To see this take E@ in a way that 
Eq — >• X — > y forms a Markov chain, and furthermore 
p(eg|a;) ~ p{y\x). Take £4 in a way that £4 — > X — > Y forms 
a Markov chain, and furthermore /(-E4; X) — H{X\Y). Take 
E 5 = (X, Y), E 3 = Y and E 2 = constant. To verify these 
equations, it is useful to note that since C 5 = H(X, Y) those 
equations involving C 5 will be automatically satisfied. Because 
E@ — > X — > y forms a Markov chain and p(e 6 |a;) ~ p(y\x), 
we have J(£ 6 ;X,y) = I(E 6 :X) = I(Y;X). 
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